Master Code Refactoring Execution Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Date: 2025-11-03 Status: Ready for Execution Scope: Complete codebase refactoring (127 Python files) Estimated Duration: 8-12 weeks (full-time)

Executive Summary

Complete code review findings across Phase 2 (15 critical files) and Phase 3 (112 remaining files) reveal 1 critical security vulnerability, 2 architectural God classes, and 150+ technical debt violations requiring systematic refactoring.

Review Completion Status

✅ Phase 2 Complete: 15/15 critical path files (100%)
✅ Phase 3 Complete: 112/112 remaining files (100%)
✅ Total Reviewed: 127/127 files (100%)

Findings Summary

Critical Issues (P0 - Blockers): 1. 🔴 SQL Injection - base_repository.py (SECURITY BLOCKER) 2. 🔴 God Class - api_enrichment.py (2068 lines, 800-line function) 3. 🔴 God Function - schedulers.py (131-line function)

High Priority (P1): - 5 files 500-800 lines (167% to 95% oversized) - 10+ functions 100-215 lines long - Inherited SQL injection in 3 repository subclasses

Medium Priority (P2): - 30+ files 300-500 lines - 50+ functions 50-100 lines - 30+ files missing error handling

Total Technical Debt: ~150+ violations

Phased Execution Strategy

Sprint 0: Security Critical (Week 1) - BLOCKER

Duration: 3-5 days Priority: 🔴 MUST COMPLETE BEFORE ANY OTHER WORK

Goal: Eliminate SQL injection vulnerability before proceeding with any other refactoring.

Task 0.1: Fix SQL Injection (2 days)

Files: base_repository.py, event_repository.py, participant_repository.py, unmatched_channel_repository.py

Implementation: 1. Add ALLOWED_TABLES whitelist validation (4 hours) 2. Add TABLE_SCHEMAS column validation (4 hours) 3. Test all repository subclasses (2 hours) 4. Security audit test suite (2 hours) 5. Migration planning for soft delete (2 hours) 6. Deploy security fix (2 hours)

Deliverable: SQL injection vulnerability eliminated, soft delete pattern implemented

Success Criteria: - ✅ All table names validated against whitelist - ✅ All column names validated against schemas - ✅ 25+ security tests passing - ✅ No hard DELETE statements in codebase - ✅ Security audit complete

Reference: See 2025-11-03-sprint1-security-fixes-plan.md for detailed implementation steps

Task 0.2: Deploy Security Fix (1 day)

Create migration 005 (add record_status field)
Test migration on staging
Deploy to production
Verify all repositories work correctly
Monitor for issues

Blocker Resolution: Once complete, proceed to Sprint 1

Sprint 1: Architectural Foundations (Weeks 2-5)

Duration: 4 weeks Priority: 🔴 P0 - Critical for Progress

Goal: Refactor God classes that block all future improvements.

Task 1.1: Refactor api_enrichment.py (3 weeks)

Current State: 2068 lines, 800-line function, 10+ responsibilities

Target State: 14 focused modules using Chain of Responsibility + Observer patterns

Files to Create (from design document): 1. backend/epgoat/services/enrichment/base.py - Abstract enrichment strategy 2. backend/epgoat/services/enrichment/strategies/ - 8 strategy modules 3. backend/epgoat/services/enrichment/observers/ - 4 observer modules 4. backend/epgoat/services/enrichment/factory.py - Strategy factory 5. backend/epgoat/services/enrichment/__init__.py - Public API

Implementation Phases:

Week 1: Setup + Core Abstraction - Create base interfaces (EnrichmentStrategy, EnrichmentObserver) - Create factory pattern - Set up test infrastructure - Move cost tracking to observer (500 lines → 200 lines module)

Week 2: Extract Strategies (4 modules) - TheSportsDBStrategy (600 lines → 200 lines) - ESPNStrategy (400 lines → 150 lines) - LocalDBStrategy (300 lines → 100 lines) - FloSportsStrategy (200 lines → 80 lines)

Week 3: Extract Remaining Logic - Team parsing module (300 lines → 150 lines) - Time extraction module (200 lines → 100 lines) - Cache management module (200 lines → 100 lines) - Statistics learning observer (100 lines)

Week 4: Integration + Testing - Wire all components together - Comprehensive integration tests - Performance testing - Documentation - Deploy refactored version

Success Criteria: - ✅ No function >50 lines - ✅ Each module <300 lines - ✅ Single Responsibility Principle - ✅ All tests passing - ✅ Same functionality, cleaner code

Reference: See 2025-11-03-api-enrichment-refactoring-design.md for complete design

Task 1.2: Refactor schedulers.py (3 days)

Current State: 464 lines, 131-line function build_schedule_for_channel()

Target State: <300 lines, all functions <50 lines

Implementation: 1. Extract _initialize_schedule() (15 lines) 2. Extract _process_events_for_day() (30 lines) 3. Extract _add_programming_blocks() (35 lines) 4. Extract _finalize_schedule() (25 lines) 5. Main function becomes 20-line orchestrator 6. Add comprehensive tests

Success Criteria: - ✅ All functions <50 lines - ✅ File <300 lines - ✅ Clear separation of concerns - ✅ All tests passing

Sprint 2: Major File Refactoring (Weeks 6-8)

Duration: 3 weeks Priority: 🟡 P1 - High Technical Debt

Goal: Split top 10 oversized files (500-800 lines) into maintainable modules

Batch 2A: Utilities Layer (Week 6)

Task 2.1: Split refresh_event_db_v2.py (2 days) - Current: 802 lines (167% over!) - Target: 3 modules

New Structure: 1. utilities/event_refresh/db_client.py (200 lines) - D1 operations 2. utilities/event_refresh/transformer.py (250 lines) - Data transformation 3. utilities/event_refresh/batch_processor.py (200 lines) - Batch operations 4. utilities/event_refresh/__init__.py (50 lines) - Public API

Task 2.2: Split cli/run_provider.py (2 days) - Current: 688 lines (129% over!) - Target: 4 modules

New Structure: 1. cli/commands/validate.py (150 lines) - Validation logic 2. cli/commands/refresh.py (150 lines) - Refresh commands 3. cli/commands/generate.py (150 lines) - Generation logic 4. cli/runner.py (150 lines) - Main orchestrator

Task 2.3: Split event_database.py (2 days) - Current: 648 lines (116% over!), 215-line function! - Target: 3 modules

New Structure: 1. data/event_database/crud.py (200 lines) - CRUD operations 2. data/event_database/matcher.py (200 lines) - Event matching (extract 215-line function!) 3. data/event_database/refresh.py (200 lines) - Refresh logic

Batch 2B: Core & Clients (Week 7)

Task 2.4: Split backend/epgoat/domain/parsers.py (2 days) - Current: 589 lines (96% over!), 159-line function - Target: 3 modules

New Structure: 1. core/parsers/time_parser.py (200 lines) - Time extraction (extract 159-line function!) 2. core/parsers/m3u_parser.py (200 lines) - M3U parsing 3. core/parsers/team_parser.py (150 lines) - Team extraction

Task 2.5: Split clients/api_client.py (2 days) - Current: 586 lines (95% over!) - Target: 3 modules

New Structure: 1. clients/base_client.py (150 lines) - Base request handling 2. clients/thesportsdb_client.py (200 lines) - TheSportsDB specific 3. clients/event_matcher.py (200 lines) - Event matching logic

Batch 2C: Services Layer (Week 8)

Task 2.6: Split match_manager.py (1 day) - Current: 533 lines, 113-line function - Target: 2 modules (match_manager.py + validation.py)

Task 2.7: Split event_details_cache.py (1 day) - Current: 527 lines - Target: 2 modules (cache.py + storage.py)

Task 2.8: Split match_learner.py (1 day) - Current: 522 lines - Target: 2 modules (learner.py + algorithms.py)

Task 2.9: Split analyze_mismatches.py (1 day) - Current: 501 lines, 4 long functions - Target: 2 modules (analyzer.py + excel_export.py)

Task 2.10: Split mismatch_tracker.py (1 day) - Current: 470 lines, 3 long functions - Target: 2 modules (tracker.py + reporter.py)

Sprint 3: Medium Refactoring (Weeks 9-10)

Duration: 2 weeks Priority: 🟡 P2 - Medium Technical Debt

Goal: Refactor 20+ files in 300-400 line range

Batch 3A: Services (Week 9)

Files to refactor (extract long functions, add error handling): 1. family_league_inference.py (434L) - 3 long functions 2. logo_generator.py (322L) - 1 long function (100L) 3. match_debug_logger.py (459L) - 1 long function (180L!) 4. match_suggestions.py (382L) - 1 long function 5. provider_config_manager.py (474L) - 3 long functions 6. provider_orchestrator.py (394L) - 1 long function (90L) 7. scoped_team_extractor.py (313L) - 1 long function (95L) 8. enhanced_match_cache.py (304L) - Add error handling

Approach: Extract helper methods, add try/except blocks, improve logging

Estimated: 1-2 days per file, batch processing

Batch 3B: Data & Database (Week 10)

Files to refactor: 1. enhanced_event_matcher.py (363L) - 3 long functions 2. enhanced_team_matcher.py (460L) - 2 long functions + error handling 3. database/connection.py (369L) - 2 long functions 4. database/migration_runner.py (386L) - 1 long function 5. parsers/provider_m3u_parser.py (370L) - 1 long function 6. clients/espn_api_client.py (396L) - 1 long function (159L!) 7. clients/tv_schedule_client.py (461L) - 3 long functions

Approach: Extract methods, add error handling, improve structure

Sprint 4: Polish & Error Handling (Weeks 11-12)

Duration: 2 weeks Priority: 🟢 P3 - Quality Improvements

Goal: Add error handling, fix minor issues, improve documentation

Task 4.1: Add Error Handling (Week 11)

30+ files missing try/except blocks:

High Priority (database/API operations): - database_interface.py - enhanced_team_matcher.py - All utilities/*.py scripts - xmltv.py - All init.py files

Approach: 1. Identify risky operations (file I/O, network, database) 2. Add specific exception handling 3. Add logging for debugging 4. Add graceful degradation where possible

Estimated: 30 files × 30 minutes = 15 hours

Task 4.2: Fix Minor Issues (Week 11)

Fix quick wins: 1. regex_matcher.py - Complete missing stages or remove docs (2 hours) 2. enhanced_league_inference.py - Fix type hint any → Any (5 min) 3. audit_csv.py - Extract 2 long functions (2 hours) 4. bidirectional_matcher.py - Add error handling (1 hour) 5. family_discovery.py - Extract 2 long functions (2 hours)

Estimated: 8-10 hours

Task 4.3: Documentation Sweep (Week 12)

Add missing docstrings: 1. Audit all public functions 2. Add Google-style docstrings 3. Add module-level docstrings 4. Update examples in docstrings 5. Verify all Args/Returns/Raises documented

Estimated: 3-4 days

Task 4.4: Final Validation (Week 12)

Run complete CI/CD pipeline: 1. Black formatting (100% compliance) 2. Ruff linting (0 violations) 3. mypy type checking (0 errors) 4. pytest (all tests passing) 5. Coverage report (>80%) 6. Security scan (bandit, semgrep)

Create final report: - Before/after metrics - Technical debt eliminated - Remaining known issues - Recommendations

Implementation Guidelines

Development Workflow

For Each Task: 1. Create feature branch: refactor/task-N-description 2. Write failing tests (TDD) 3. Implement changes 4. Run tests (must pass) 5. Run linting/type checking 6. Commit with descriptive message 7. Create PR for review 8. Merge to main

Commit Message Format:

refactor(module): brief description

- Detailed change 1
- Detailed change 2

Relates to: Phase 2/3 Code Review
Files affected: X
Lines changed: +Y -Z

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

Testing Strategy

Test Coverage Requirements: - Critical path files: >90% coverage - All other files: >80% coverage - New modules: 100% coverage - Integration tests for major refactors

Test Types: 1. Unit tests (isolated function behavior) 2. Integration tests (module interactions) 3. Security tests (SQL injection, input validation) 4. Performance tests (no regressions) 5. Regression tests (existing functionality)

Code Review Checkpoints

After Each Sprint: 1. Code review all changes 2. Run full test suite 3. Performance benchmarking 4. Security audit 5. Documentation review 6. Stakeholder demo

Review Criteria: - ✅ All functions <50 lines - ✅ All files <300 lines - ✅ 100% type hints - ✅ Google-style docstrings - ✅ Error handling present - ✅ Tests passing - ✅ No security vulnerabilities

Risk Mitigation

High-Risk Changes

api_enrichment.py refactoring: - Risk: Breaking production enrichment - Mitigation: - Feature flag for new implementation - A/B testing for 1 week - Rollback plan ready - Monitor metrics closely

SQL injection fix: - Risk: Breaking existing queries - Mitigation: - Comprehensive test suite first - Staging deployment with real data - Gradual rollout - Emergency rollback prepared

Rollback Strategy

For Each Major Change: 1. Tag release before deployment 2. Document rollback procedure 3. Test rollback in staging 4. Monitor metrics post-deployment 5. Keep rollback window open 24-48 hours

Resource Requirements

Team Allocation

Minimum: 1 senior engineer full-time - Sprint 0: Security fix (solo work, critical) - Sprint 1: api_enrichment.py (may need design review) - Sprint 2-4: Can parallelize with additional engineers

Optimal: 2 engineers - Engineer 1: Critical path (Sprint 0-1) - Engineer 2: Medium refactoring (Sprint 2-3) - Both: Sprint 4 polish + final validation

Time Estimates

Conservative (1 engineer): - Sprint 0: 1 week - Sprint 1: 4 weeks - Sprint 2: 3 weeks - Sprint 3: 2 weeks - Sprint 4: 2 weeks - Total: 12 weeks

Optimistic (2 engineers, parallel work): - Sprint 0: 1 week (blocking) - Sprint 1-2: 4 weeks (parallel) - Sprint 3-4: 2 weeks (parallel) - Total: 7-8 weeks

Realistic (1 engineer + reviews): - Add 20% buffer for reviews, bugs, testing - Total: 10-12 weeks

Success Metrics

Before Refactoring

Files >300 lines: 35 (28%)
Functions >50 lines: 60+ (unknown %)
Missing error handling: 30+ files
SQL injection: 1 critical vulnerability
God classes: 2 (api_enrichment.py, schedulers.py)
Test coverage: Unknown
Type hint coverage: ~95%

Target After Refactoring

Files >300 lines: 0 (0%)
Functions >50 lines: 0 (0%)
Missing error handling: 0 files
SQL injection: 0 vulnerabilities
God classes: 0
Test coverage: >80% overall, >90% critical path
Type hint coverage: 100%
Linting violations: 0
Security vulnerabilities: 0

Quality Gates

Must Pass Before Next Sprint: - ✅ All tests passing - ✅ Code review approved - ✅ No critical security issues - ✅ Performance benchmarks met - ✅ Documentation updated

Dependencies & Blockers

External Dependencies

None identified (all work is internal refactoring)

Internal Dependencies

Sprint 0 blocks all other work - Security must be fixed first
Sprint 1 (api_enrichment.py) blocks:
Any enrichment improvements
New API integrations
Caching optimizations
Sprint 2-3 can be parallelized if multiple engineers

Known Blockers

Need staging environment for testing migrations
Need production-like data for integration tests
May need stakeholder approval for major architectural changes

Appendix

Design Documents: - API Enrichment Refactoring Design - Sprint 1 Security Fixes Plan

Review Reports: - Phase 2 Complete Findings - Phase 3 Complete Findings - Phase 2 Progress Tracker

Standards: - Engineering Standards - Python Standards - Architecture Patterns

File Inventory

Phase 2 Critical Files (15 total): - Matching Pipeline: api_enrichment.py, regex_matcher.py, enhanced_league_inference.py, patterns.py, league_inference.py - Data Integrity: base_repository.py, event_repository.py, participant_repository.py, unmatched_channel_repository.py - Database & API: connection.py, d1_client.py, thesportsdb_client.py (integrated) - Core Pipeline: schedulers.py, xmltv.py, epg_generator.py

Phase 3 Files (112 total): - Services: 24 files - Data: 10 files - Database: 6 files - Utilities: 15 files - Core: 7 files - Models: 1 file - Parsers: 3 files - Clients: 4 files - CLI: 2 files - Tests: 36 files - Root: 4 files

Total: 127 Python files

Execution Status

Current Status: ✅ Ready for Execution Next Action: Begin Sprint 0 (Security Fix) Approval Required: Yes (stakeholder sign-off on timeline + resources) Estimated Start Date: TBD Estimated Completion Date: TBD (10-12 weeks from start)

Plan Version: 1.0 Created: 2025-11-03 Author: Claude (AI Code Reviewer) Approved By: [Pending] Last Updated: 2025-11-03